Cluster #17

HendrikSchultheis · 2018-12-19T16:34:39Z

renamed reduce_bed to reduce_sequence (@renewiegandt please check if I forgot to change anything in nextflow)
Better error messages + documentation
Both scripts (reduce_sequence & cdhit_wrapper) now check if there is a header provided with the data. If this is the case the header is forwarded to the output.
reduce_sequence checks whether jellyfish is installed (only on linux)
cdhit_wrapper checks whether cd-hit is installed (only on linux)

So far the check whether the required tools are installed only works on linux. For now this should be enough but tell me if I should look into making this platform independent.

renewiegandt · 2018-12-19T20:46:42Z

bin/reduce_sequence.R

-#' @return reduced bed
-#' TODO check whether jellyfish is installed
-reduce_bed <- function(input, kmer = 10, motif = 10, output = "reduced.bed", threads = NULL, clean = TRUE, minoverlap_kmer = kmer - 1, minoverlap_motif = ceiling(motif / 2), min_seq_length = max(c(motif, kmer)), motif_occurence = 1) {
+#' @details If there is a header supplied other then the default data.table nameing scheme ('V1', 'V2', etc.) it will be kept.


Typo: naming scheme

renewiegandt · 2018-12-19T20:49:39Z

bin/reduce_sequence.R

-#' @param threads Number of threads. Default = 1. 0 for all cores.
+#' @param input Input bed-file. Last column must be sequences.
+#' @param kmer Kmer length. Default = 10
+#' @param motif Estimated motif length. Default = 10


Parameter 'motif' does not contain a lot of information about its function.
Maybe it should be named something like 'estimated_motif_len'?

renewiegandt · 2018-12-19T20:50:38Z

bin/reduce_sequence.R

-#' @param output Output file
-#' @param threads Number of threads. Default = 1. 0 for all cores.
+#' @param input Input bed-file. Last column must be sequences.
+#' @param kmer Kmer length. Default = 10


kmer_len instead of kmer?

renewiegandt · 2018-12-19T20:51:42Z

bin/reduce_sequence.R

 #' @param clean Delete all temporary files.
-#' @param minoverlap_kmer Minimum required overlap between kmer to merge kmer. Used to create reduced sequence ranges. Can not be greater than kmer length. Default = kmer - 1
+#' @param minoverlap_kmer Minimum required overlap between kmer. Used to create reduced sequence ranges out of merged kmer. Can not be greater than kmer length . Default = kmer - 1


Typo:

length.

k-mers

renewiegandt · 2018-12-19T20:54:11Z

bin/reduce_sequence.R

@@ -9,32 +9,41 @@ option_list <- list(
  make_option(opt_str = c("-t", "--threads"), default = 1, help = "Number of threads to use. Use 0 for all available cores. Default = %default", metavar = "integer"),
  make_option(opt_str = c("-c", "--clean"), default = TRUE, help = "Delete all temporary files. Default = %default", metavar = "logical"),
  make_option(opt_str = c("-s", "--min_seq_length"), default = NULL, help = "Remove sequences below this length. Defaults to the maximum value of motif and kmer and can not be lower.", metavar = "integer", type = "integer"),
-  make_option(opt_str = c("-n", "--minoverlap_kmer"), default = NULL, help = "Minimum required overlap between kmer to merge kmer. Used to create reduced sequence ranges. Can not be greater than kmer length. Default = kmer - 1", metavar = "integer", type = "integer"),
+  make_option(opt_str = c("-n", "--minoverlap_kmer"), default = NULL, help = "Minimum required overlap between kmer. Used to create reduced sequence ranges out of merged kmer. Can not be greater than kmer length. Default = kmer - 1", metavar = "integer", type = "integer"),


Typo: k-mers. 2x

renewiegandt · 2018-12-19T20:57:22Z

bin/cdhit_wrapper.R

@@ -68,24 +69,33 @@ opt <- parse_args(opt_parser)
 #' @param gat_ext Gap extension score. Default = -1 (CD-HIT parameter)
 #' @param sort_cluster_by_size Either sort cluster by decreasing length (= 0) or by decreasing size (= 1). Default = 1 (CD-HIT parameter)
 #' 
-#' TODO check whether cdhit is installed
+#' @details If there is a header supplied other then the default data.table nameing scheme ('V1', 'V2', etc.) it will be kept and extended.


Typo: naming scheme

renewiegandt · 2018-12-19T21:00:09Z

pipeline.nf

@@ -268,7 +268,7 @@ process overlap_with_known_TFBS {

 /*
 */
-process reduce_bed {
+process reduce_sequence {


Please add a small description to the process reduce_sequence and clustering.

renewiegandt

Found a few typos and made a few suggestions for parameter names. Apperently 'kmer' is writen k-mer you should take a look at that.
You could also add a small description for the processes reduce_sequences and clustering.
The changes in the code functionality look good to me.

HendrikSchultheis added 10 commits December 19, 2018 13:47

refactoring; renamed reduce_bed to reduce_sequence

98985d1

check whether jellyfish is installed

e0b9d38

reduce_bed renamed to reduce_sequence

1730868

check whether jellyfish is installed

88fa298

check whether cdhit is installed

e17d1db

omit TODO

dcd185e

check for header and forward it if provided

4c16f6f

automatically detect and keep column names if provided

5a7c84e

added author; better missing input error

97464ca

added author

cc532bf

HendrikSchultheis requested a review from renewiegandt December 19, 2018 16:34

HendrikSchultheis mentioned this pull request Dec 19, 2018

ToDo List #10

Open

35 tasks

renewiegandt reviewed Dec 19, 2018

View reviewed changes

renewiegandt requested changes Dec 19, 2018

View reviewed changes

HendrikSchultheis added 4 commits December 20, 2018 13:21

spell check

d60faa7

spell check

6507643

fixed more typos

d86f788

process description for reduce_sequence and clustering

756e98f

HendrikSchultheis self-assigned this Dec 20, 2018

HendrikSchultheis added the enhancement New feature or request label Dec 20, 2018

renewiegandt approved these changes Dec 20, 2018

View reviewed changes

HendrikSchultheis merged commit 935ba3f into dev Dec 21, 2018

Cluster #17

Cluster #17

HendrikSchultheis commented Dec 19, 2018

renewiegandt Dec 19, 2018

renewiegandt Dec 19, 2018

renewiegandt Dec 19, 2018

renewiegandt Dec 19, 2018 •

edited

Loading

renewiegandt Dec 19, 2018 •

edited

Loading

renewiegandt Dec 19, 2018

renewiegandt Dec 19, 2018 •

edited

Loading

renewiegandt left a comment

Cluster #17

Cluster #17

Conversation

HendrikSchultheis commented Dec 19, 2018

renewiegandt Dec 19, 2018

Choose a reason for hiding this comment

renewiegandt Dec 19, 2018

Choose a reason for hiding this comment

renewiegandt Dec 19, 2018

Choose a reason for hiding this comment

renewiegandt Dec 19, 2018 • edited Loading

Choose a reason for hiding this comment

renewiegandt Dec 19, 2018 • edited Loading

Choose a reason for hiding this comment

renewiegandt Dec 19, 2018

Choose a reason for hiding this comment

renewiegandt Dec 19, 2018 • edited Loading

Choose a reason for hiding this comment

renewiegandt left a comment

Choose a reason for hiding this comment

renewiegandt Dec 19, 2018 •

edited

Loading

renewiegandt Dec 19, 2018 •

edited

Loading

renewiegandt Dec 19, 2018 •

edited

Loading